System and method for extracting symbols from numeric time series for forecasting extreme events

ABSTRACT

A method for extracting symbols from a numeric time series comprises the steps of: receiving a finite time series of data elements for a particular application, the data elements characterized as having one or more sharp changes in values; for each sharp change in the finite time series, extracting a window of elements from the time series that precedes each sharp change; building a matrix from the time series window extracts; performing singular value decomposition on the built matrix to obtain characteristic matrices; and, obtaining vectors of symbols from resulting characteristic vectors determined from the singular value decomposition step, wherein the resulting symbols are used by forecasting algorithm to predict a future sharp change in subsequent finite time series received for the application. The sets of finite symbols obtained may be used as the basis for applications requiring the prediction of an extreme change in a received time series.

BACKGROUND OF THE INVENTION

[0001] 1. Field of the Invention

[0002] This invention relates generally to systems and methods forrecording and processing time series data, i.e., numeric data that arerecorded in time sequence, and particularly to a system and method forpredicting an extreme change in a time series given past given pastvalues of the series using symbol extraction.

[0003] 2. Discussion of the Prior Art

[0004] One problem that often arises when recording and processing timeseries data is that the numeric data are too noisy. Noise can be due tomeasurement error of the time series, or due to process noise whicharises in situations where the data that are measured are subject toshocks due to the data generation process. For example, in a stockmarket scenario, where a future stock value (of a time series) is to bepredicted given past values of the series, each stock price at a giventime is due to the impact of every trader on the market. Thus, it issaid that the measurements are due to process noise. In these events, itis necessary to invoke “noise reduction strategies,” i.e., methods toreduce the noise present in the observations.

[0005] Signal processing technology is replete with numerous noisereduction techniques. The following references: Scharf, et. al. entitledL.L. Scharf, Statistical Signal Processing: Detection, Estimation, andTime Series Analysis (New York: Addison-Wesley Publishing Co., 1990);L.L. Scharf, “The SVD and Reduced Rank Signal Processing,” Chapter 1 inThe SVD and its Applications, R. Vaccaro, ed. (Elsevier, 1991); and,“Digital Signal Processing,” by Richard A. Roberts, Clifford T.Mullis(Contributor). Hardcover (February 1987) outline some of thewell-known noise reduction strategies. One particular signal processingnoise reduction technique is known as singular value decomposition asdescribed in the reference to C.R. Rao entitled “Linear StatisticalAnalysis and its Applications” (1963).

[0006] Another method for processing time series signals, in particular,utilizes moving average techniques to reduce noise. Moving averages arecomputed by taking subsets of sequences of numbers, computing theaverage of those numbers, recording the result, and then shifting thesubset by one unit in time. Other noise reduction techniques include theapplication of a digital filter. Essentially, most of the noisereduction techniques rely on moving averages of the data, which do notgenerate symbol sequences.

[0007] It would be highly desirable to provide an improved method andmechanism for forcasting future time series values, and particularly,extreme events, based on past time series data values.

[0008] It would additionally be highly desirable to provide an improvedmethod for processing time series signals in which a time series isconverted to a symbol sequence comprising sets of finite symbols whichmay be used as a basis for forcasting future time series values.

SUMMARY OF THE INVENTION

[0009] It is an object of the present invention to provide a method forextracting symbols from a numeric time series of data which symbolsprovide the basis for forecasting values from future time series.

[0010] According to the invention, there is provided a method forextracting symbols from a numeric time series comprising the steps of:receiving a finite time series of data elements for a particularapplication, the data elements characterized as having one or more sharpchanges in values; for each sharp change in the finite time series,extracting a window of elements from the time series that precedes eachsharp change; building a matrix from the time series window extracts;performing singular value decomposition on the built matrix to obtaincharacteristic matrices; and, obtaining vectors of symbols fromresulting characteristic vectors determined from the singular valuedecomposition step, wherein the resulting symbols are used byforecasting algorithm to predict a future sharp change in subsequentfinite time series received for the application.

[0011] Advantageously, the sets of finite symbols obtained may be usedas the basis for applications requiring the prediction of an extremechange in a received time series.

BRIEF DESCRIPTION OF THE DRAWINGS

[0012] The foregoing objects and advantages of the present invention maybe more readily understood by one skilled in the art with referencebeing had to the following detailed description of a preferredembodiment thereof, taken in conjunction with the accompanying drawings,in which:

[0013]FIG. 1(a) indicates an example disaster indicator indicating anextreme event relating to a time series of data.

[0014] FIGS. 1(b)-1(c) indicate respective example time series datasignals e(t) and y(t) signals for determining extreme behavior signalsused for forecasting extreme events in accordance with the principles ofthe invention.

[0015]FIG. 2 illustrates a first six “extreme event” symbols that arederived for the example time series data according to the invention.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

[0016] According to the principles of the invention, assuming that alarge time-series of data is received having values in which a sharp(extreme) change is exhibited, the system of the present inventionincludes steps for:

[0017] (1) extracting a window of the time series that immediatelyprecedes each sharp change;

[0018] (2) building a matrix of these time series extracts by placingeach time series vector in row-major order;

[0019] (3) performing singular value decomposition on this matrix; and,

[0020] (4) using the resulting characteristic vectors as the symbols.

[0021] According to the invention, a sharp (extreme) change is definedas being a change of 30% in the time series over 10 units of time.

[0022] More particularly, the symbol extraction algorithm is as follows:

[0023] Given a finite time series of length T with values denoted asy(t), with N extreme changes at time steps {k(1), k(2) , . . , k(N)};then, choose a window size W and compute the following:

[0024] a) Set j=1

[0025] b) Let x(j)=[y(k(1)−W), y(k(2)−W) , . . . , y(W)]

[0026] c) Let X=[X;x(j)];

[0027] d) Repeat until j=N

[0028] A typical value is W=25, however, as known to skilled artisans,is dependent upon the application. When the steps a)-d) are performed,the result is a matrix X, (N×W matrix).

[0029] The next step involves computing the well known singular valuedecomposition (“svd”) of the resulting matrix X. That is:

[0030] e) perform svd(X)=[U, S, V]

[0031] Specifically, given matrix X, the well known mathematicaltransformation X=U S V^(T) is performed where S is a diagonal (N ×W)matrix and U and V are the characteristic orthogonal matrices such thatU is an N×N matrix and V is an W×W matrix. As known, the columns of thematrix U are the left singular vectors of the matrix A and the columnspace of the matrix V (or the row space of V^(T)) are the right singularvectors of the matrix A;

[0032] According to the preferred embodiment of the invention, thesymbols to be used for subsequent forecasting are held in the matrix V.

[0033] When a new time series arrives, sequential windows of the seriesare taken. A computation is then performed to take a normalizeddot-product against the characteristic vectors. These scores (thenormalized dot-product) become the new time series. It is the case thatthe resultant scores have a better signal to noise ratio, and thereforecan be used to make a prediction of whether or not the extreme change isgoing to occur in the time series. More particularly, the forecastingtechnique on the new series is as follows:

[0034] f) receive a new time series z(t);

[0035] g) generate windows of length W in the new time series;

[0036] h) compute the dot product of the windows against a predeterminednumber of columns of characteristic matrix V; and,

[0037] i) perform forecasting on the new series.

[0038] It has been experimentally determined that the predeterminednumber of columns of the characteristic matrix V in which the dotproduct is computed is the first five (5) columns. In theory, it may beany number between 1 and W.

[0039] An example application of the present invention is now provided:In the example, the closing prices of stocks from a well known stockindex (Russell 2000) for a particular year were gathered. This amountedto a time series of stocks corresponding to about 255 days of data for2000 stocks. For each of the time series, there is calculated anothertime series that indicates the degree of extreme behavior that the timeseries exhibits.

[0040] Given a time series y(t), the extreme behavior signal iscalculated as follows:

e(t)=(y(t−M)−y(t))/y(t−M)

[0041] Thus, if for a given value of t, e(t)=0.3, that means that thevalue of the time series y(t) changed by 30% in M time units.

[0042] Any time the signal e(t) crosses the threshold of 0.3, this ischaracterized as an extreme change. Note: the term ‘disaster’ andextreme behavior are used herein interchangably.

[0043] Whenever the extreme behavior signal crosses the threshold of 0.3(or equivalently 30%), the price data that precedes it for the last 20days (e.g., W=20, in this case) is extracted. FIG. 1(a) illustrates adisaster indication 50 for a company with ticker XXXX. FIG. 1(b)illustrates the extreme behavior signal e(t) indicating variations instock price as compared to the 30% disaster threshold 60. The locationof the disaster is indicated in the FIG. 1(b) by arrow A. In FIG. 1(c)there is illustrated the company XXXX price data signal y(t) for theexample year and indicating the time window W that precedes the day whenthe threshold had been exceeded. According to this example, this data 75is gathered for the prior 20 days (W=20).

[0044] After going through all 2000 stocks, an extreme events library inthe form of a matrix X is formed. Each row of the matrix X correspondsto the W values of the time series that precede an extreme event. Thus,if the total number of extreme events is N, the size of the matrix X isN×W.

[0045] The decomposition is performed using the standard algorithm forsingular value decomposition. New matrices U, S, and V are returnedaccording to the equation: X=USV* The symbols reside in the columns ofmatrix V*.

[0046]FIG. 2 illustrates the first six (6) “extreme event” symbols101-106 that are derived for the stock index for that particular yearaccording to the invention. Once the extreme event symbols aregenerated, a search is conducted for those symbols in another timeseries. Thus, if there is a window in time where the time seriesincludes those symbols, it is likely that an extreme event will occur.Thus, the next step includes mapping new signals onto the extreme eventsignals: For example, if it is desired to predict extreme events for theRussell 200 stock index for the next year, the following steps areperformed:

[0047] 1. Choose a time series;

[0048] 2. Extract a window of length W;

[0049] 3. Take the dot product (vector product) between the time seriesand the symbols.

[0050] 4. Implement a classifier to determine whether the valuesreturned from the dot product are predictive of an extreme event.

[0051] If the values are close enough, it is likely that an “extremeevent” may occur with high probability within the next M time units.

[0052] While the invention has been particularly shown and describedwith respect to illustrative and preformed embodiments thereof, it willbe understood by those skilled in the art that the foregoing and otherchanges in form and details may be made therein without departing fromthe spirit and scope of the invention which should be limited only bythe scope of the appended claims.

Having thus described our invention, what we claim as new, and desire tosecure by Letters Patent is:
 1. A method for extracting symbols from anumeric time series comprising the steps of: a) receiving a finite timeseries of data elements for a particular application, said data elementscharacterized as having one or more sharp changes in values; b) for eachsharp change in said finite time series, extracting a window of elementsfrom said time series that precedes each sharp change; c) building amatrix from said time series window extracts; d) performing singularvalue decomposition on said built matrix to obtain characteristicvectors; and, e) obtaining a set of symbols from resultingcharacteristic vectors determined from said step d), wherein saidresulting symbols are used by forecasting algorithm to predict a futuresharp change in subsequent finite time series received for saidapplication.
 2. The method as claimed in claim 1, wherein each extractedwindow comprises a time series vector, said step c) of building a matrixincludes placing each time series vector in row-major order.
 3. Themethod as claimed in claim 2, wherein said time series comprising Telements having values denoted as y(t), said time series of data furthercharacterized as having N extreme changes at time steps {k(1), k(2), . .. , k(N)}, said step c) of building said matrix comprising the steps of:i) initialize j=1; ii) calculate x(j)=[y(k(1)−W), y(k(2)−W), . . . ,y(W)] iii) calculate X=[X; x(j)]; iv) repeat steps i)-iii) until j=N. 4.The method as claimed in claim 3, wherein said step d) of preformingsingular value decomposition (svd) on said built matrix comprisescomputing: X=U S V^(T) where matrices U, S and V are said characteristicmatrices, said symbols to be used for predicting a future sharp changein subsequent finite time series are held in the matrix V.
 5. The methodas claimed in claim 4, further including the step of implementing aforecasting algorithm on a new received time series of data elements todetermine whether a sharp change is expected in said new time series forsaid particular application, wherein prior to said forecasting step,said method comprising the steps of: generating time series vectorwindows of length W in the new time series; computing a dot product ofsaid time series vector windows against a predetermined number ofcolumns of said characteristic matrix V.
 6. A program storage devicereadable by a machine, tangibly embodying a program of instructionsexecutable by the machine to perform method steps for extracting symbolsfrom a numeric time series, the method steps comprising: a) receiving afinite time series of data elements for a particular application, saiddata elements characterized as having one or more sharp changes invalues; b) for each sharp change in said finite time series, extractinga window of elements from said time series that precedes each sharpchange; c) building a matrix from said time series window extracts; d)performing singular value decomposition on said built matrix; to obtaincharacteristic vectors; and, e) obtaining a set of symbols fromresulting characteristic vectors determined from said step d), whereinsaid resulting symbols are used to predict a future sharp change insubsequent finite time series received for said application.
 7. Theprogram storage device readable by a machine as claimed in claim 6,wherein each extracted window comprises a time series vector, said stepc) of building a matrix includes placing each time series vector inrow-major order.
 8. The program storage device readable by a machine asclaimed in claim 7, wherein said time series comprising T elementshaving values denoted as y(t), said time series of data furthercharacterized as having N extreme changes at time steps {k(1), k(2) , .. . , k(N)}, said step c) of building said matrix comprising the stepsof: i) initialize j=1; ii) calculate x(j)=[y(k(1)−W), y(k(2)−W) , . . ., y(W)] iii) calculate X=[X;x(j)]; iv) repeat steps i)-iii) until j=N.9. The program storage device readable by a machine as claimed in claim8, wherein said step d) of performing singular value decomposition (svd)on said built matrix comprises computing: X=U S V^(T) where matrices U,S and V are said characteristic matrices, said symbols to be used forpredicting a future sharp change in subsequent finite time series areheld in the matrix V.
 10. The program storage device readable by amachine as claimed in claim 9, wherein said method step further includesthe step of implementing a forecasting algorithm on a new received timeseries of data elements to determine whether a sharp change is expectedin said new time series for said particular application, wherein priorto said forecasting step, said method comprising the steps of:generating time series vector windows of length W in the new timeseries; computing a dot product of said time series vector windowsagainst a predetermined number of columns of said characteristic matrixV.